WASPBENCH: a lexicographer's workbench incorporating state-of-the-art word sense disambiguation
نویسندگان
چکیده
Human Language Technologies (HLT) need dictionaries, to tell them what words mean and how they behave. People making dictionaries (lexicographers) need HLT, to help them identify how words behave so they can make better dictionaries. Thus a potential for synergy exists across the range of lexical data in the construction of headword lists, for spelling correction, phonetics, morphology and syntax, but nowhere more than for semantics, and in particular the vexed question of how a word's meaning should be analysed into distinct senses. HLT needs all the help it can get from dictionaries, because it is a very hard problem to identify which meaning of a word applies. Lexicographers need all the help they can get because the analysis of meaning is the second hardest part of their job (Kilgarriff, 1998), it occupies a large share of their working hours, and it is one where, currently, they have very little to go on beyond intuition and other dictionaries. Thus HLT system developers and corpus lexicographers can both benefit from a tool for finding and organizing the distinctive patterns of use of words in texts. Such a tool would be an asset for both language research and lexicon development, particularly for lexicons for Machine Translation. We have developed the WAS PB EN CH, a tool that (1) presents a "word sketch", a summary of the corpus evidence for a word, to the lexicographer; (2) supports the lexicographer in analysing the word into its distinct meanings and (3) uses the lexicographer's analysis as the input to a stateof-the-art word sense disambiguation (WSD) algorithm, the output of which is a "word expert" which can then disambiguate new instances of the word.
منابع مشابه
An Evaluation of a Lexicographer's Workbench Incorporating Word Sense Disambiguation
NLP system developers and corpus lexicographers would both bene t from a tool for nding and organizing the distinctive patterns of use of words in texts Such a tool would be an asset for both language research and lexicon development particularly for lexicons for Machine Translation We have developed the waspbench a tool that presents a word sketch a summary of the corpus evidence for a word to...
متن کاملAn Evaluation of a Lexicographer's Workbench: building lexicons For Machine Translation
NLP system developers and corpus lexicographers would both benefit from a tool for finding and organizing the distinctive patterns of use of words in texts. Such a tool would be an asset for both language research and lexicon development, particularly for lexicons for Machine Translation (MT). We have developed the WASPBENCH, a tool that (1) presents a "word sketch", a summary of the corpus evi...
متن کاملWASP-Bench: an MT Lexicographers' Workstation Supporting State-of-the-art Lexical Disambiguation
Most MT lexicography is devoted to developing rules of the kind, “in context C, translate source-language word S as target-language word T”. Very many such rules are required, producing them is laborious, and MT companies standardly spend large sums on it. We present the WASP-Bench, a lexicographer's workstation for the rapid and semi-automatic development of such rule-sets. The WASPBench makes...
متن کاملWASPBENCH: a lexicographer’s workbench supporting state-of-the-art word sense disambiguation
Human Language Technologies (HLT) need dictionaries, to tell them what words mean and how they behave. People making dictionaries (lexicographers) need HLT, to help them identify how words behave so they can make better dictionaries. Thus a potential for synergy exists across the range of lexical data in the construction of headword lists, for spelling correction, phonetics, morphology and synt...
متن کاملWord Relatives in Context for Word Sense Disambiguation
The current situation for Word Sense Disambiguation (WSD) is somewhat stuck due to lack of training data. We present in this paper a novel disambiguation algorithm that improves previous systems based on acquisition of examples by incorporating local context information. With a basic configuration, our method is able to obtain state-of-the-art performance. We complemented this work by evaluatin...
متن کامل